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ABSTRACT 

ZiFDB (Zinc Finger Database, littp://zifdb.msi.umn. 
edu) is a web-accessible database tliat liouses in- 
formation on individual C2H2 zinc fingers (ZFs) and 
engineered zinc finger arrays (ZFAs). ZiFDB serves 
as a resource for biologists interested in engineer- 
ing ZFAs for use as sequence-specific DNA-binding 
reagents. Here, we describe four new features of 
ZiFDB: (i) the database allows users to input new 
ZFs and ZFAs; (ii) a shadow database temporarily 
stores user-submitted data, pending approval 
by the database curator and subsequent loading 
into the persistent database; (iii) ZiFDB contains 
181 Context-Dependent Assembly (CoDA) ZFAs, 
which were generated by this newly described ZFA 
engineering platform; and (iv) the database also now 
contains 319 F1F2 CoDA units and 334 F2F3 CoDA 
units that can be used to construct CoDA arrays. In 
total, the new release of ZiFDB contains 1226 ZFs 
and 1123 ZFAs. 



INTRODUCTION 

The C2H2 zinc finger (ZF) motif, which was first described 
in the transcription factor TFIIIA from Xenopus laevis (1), 
is one of the most abundant DNA-binding motifs in 
nature. Each ZF comprises about 30 amino acids that 
fold into a 66a structure through hydrophobic inter- 
actions and binding of a zinc ion by two conserved 
cysteine and histidine residues. A ZF typically recognizes 
a continuous 3-bp DNA sequence. Owing to this DNA- 
binding capacity, ZFs serve as a framework for construct- 
ing engineered DNA-binding proteins: ZFs are hnked in 
tandem to form zinc finger arrays (ZFAs), which can rec- 
ognize extended DNA sequences (2,3). 



The methods used to engineer ZFAs to recognize novel 
target sequences can be classified into two categories: 
modular assembly and selection-based methods. 
Modular assembly simply involves hnking together ZFs 
that recognize known target sequences (3). However, 
modular assembly is not necessarily rehable. For 
example, in one study, more than half of the ZFAs 
created by modular assembly showed Httle or no activity 
(4). This is because specificity and affinity of a given finger 
is influenced by context — i.e. the position of a finger in an 
array and its neighboring fingers. Selection-based methods 
identify fingers that work well together and are thus more 
reliable for producing functional ZFAs; however, 
selection-based methods often require considerable time 
and a high level of molecular biology expertise to 
perform. In 2008, the Zinc Finger Consortium imple- 
mented a selection-based platform called Ohgomerized 
Pool Engineering (OPEN) (5). Since its debut, >500 
ZFAs have been generated by OPEN [558 are housed in 
Zinc Finger Database (ZiFDB) v2.0]. 

Despite the high efficacy of OPEN, the labor and ex- 
pertise required by this method have prevented it from 
being widely used. With the goal of combining the simpli- 
city of modular assembly with the rehability of OPEN, 
another publicly available platform, named Context- 
Dependent Assembly (CoDA), was described in 2011 by 
the Zinc Finger Consortium (6). To assemble CoDA 
ZFAs, two, two-finger units, derived through selection, 
are used that have a common ZF at position two (F2). 
For example, an F1F2 CoDA unit (recognizing 3'-GAGG 
GG) is fused to an F2F3 CoDA unit (recognizing 3'-GGG 
GTG) such that the resulting three-finger array recognizes 
a novel DNA sequence (3'-GAGGGGGTG). Although 
Moore et al. (7) showed that the activity of CoDA zinc 
finger nucleases (ZFNs) is lower than those made by 
OPEN, CoDA does not require selection steps and is 
therefore easier for most researchers to use. 
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Regardless of the method used to construct ZFAs, these 
engineered DNA-binding domains have become powerful 
tools for both basic and applied biological research. 
Engineered zinc finger transcription factors, for example, 
can be created by fusing transcriptional repressor or acti- 
vator domains to engineered ZFAs. These artificial tran- 
scription factors have been used to repress or activate 
genes in a variety of species with a high degree of specifi- 
city (8,9). Similarly, engineered ZFNs have proven effect- 
ive as targeted mutagens in diverse eukaryotes (10,11). 
ZFNs are typically composed of a customized array of 
ZFs fused to the non-specific Fokl restriction endonucle- 
ase cleavage domain (12). As Fokl needs to dimerize to be 
functional, ZFAs are designed in pairs to recognize two 
unique DNA sequences separated by a short DNA spacer. 
Binding to the two-target sequences allows Fokl to 
dimerize, cut the DNA and introduce a double-stranded 
break in the spacer. When double-stranded breaks are 
repaired through non-homologous end-joining or hom- 
ologous recombination, targeted sequence modifications 
can be introduced at or near the break site. 

ZiFDB is designed to serve as a resource for those inter- 
ested in engineering custom ZFAs or better understanding 
how ZF proteins recognize target DNA (13). In addition 



to housing information on ZFs and engineered ZFAs, 
ZiFDB is linked to the output from ZiFiT — a software 
package that assists biologists in finding sites within 
target genes for engineering ZF proteins (14). 
Consequently, ZiFDB is particularly valuable for 
determining whether a given ZFA (or portion thereof) 
has previously been constructed and whether it has the 
requisite DNA-binding activity for a given experiment. 
ZiFDB v2.0 houses an expanded number of ZFAs, from 
652 to 1123. Likewise, the number of ZFs has expanded 
from 716 to 1 126. The updated database also allows users 
to input ZFAs into a shadow database, which are then 
deposited into the persistent database by the curator on 
approval. The information in this database will continue 
to help molecular biologists develop ZF reagents that meet 
their needs for genome modification. 

NEW FEATURES 

Shadow database 

ZiFDB v2.0 allows users to directly input into the 
database new information about novel ZFAs. To ensure 
data quality, a shadow database has been created. 



1 



ZINC FINGER 
CONSORTIUM 



ZiFDB version 2.0 



Menu 

Introduction 
Instructions 
Search ZiFDB 
Submit data to 
ZiFDB 

Links 

ZiFiT software 
Zinc Fing er 
Consortium 
Addgene 



The nine nucleotide binding site you inputted is: 



F1 


F2 


F3 


GGA 


GGC 


GGG 



The following F1F2 CoDA unit recognizes Target_F1 and Target_F2: 
\]d\ ID_F1 I ID_F2 |Triplet_Fl|Triplet_F2| Helix_F1 | Helix_F2 |Article_ID] 



19, I 898 ] [ 434 ] GGA GGC I RPSKLVL LKEHLTR [W\ 



The following F2F3 CoDA unit recognizes Target_F2 and Target_F3: 



liP 1 ID_F2 1 ID_F3 |Triplet_F2 


Triplet_F3| Helix_F2 | Helix_F3 M^eJO 


3 


1 "34 1 


1 1061 I 


GGC 


GGG 


LKEHLTR 


RGDKL^L 





Note: 



1. The seven amino acids are provided for each recognition helix and correspond to positions -1, 1, 2, 3, 4, 5 and 6 {numbered relative to 
the start of the recognition helix). Residues -1. 3 and G frequently make DNA contacts. 

2. Input fields subsite F1 and subsite F2 to acquire only F1F2 CoDA units. 
3- Input fields subsite F2 and subsite F3 to acquire only F2F3 CoDA units. 



Voytas Laboratory 

University of Minnesota 
Phone:{612) 626-4509 
Email: voyfas@umn.eciu 



Figure 1. Sample output from the CoDA unit search page when GGA is provided as triplet Fl, GGC as triplet F2 and GGG as triplet F3. 
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The information submitted by the user is collected into the 
shadow database and held pending approval by the 
database curator. On approval, the information is then 
loaded into the persistent database. The information col- 
lected into the shadow database includes array name and 
array type (e.g. if it was derived by OPEN, CoDA or 
modular assembly), the sequence of the targeted DNA 
triplets, the sequences of the recognition helices of the 
ZFs, journal information if the array has been pubhshed 
and information about the submitter (see below for add- 
itional details). 

New database content 
Classes 

ZiFDB stores ZF information as a set of objects defined 
by Java classes. The previous version of ZiFDB had four 
major classes: Zinc Finger, Zinc Finger Array, Article and 
Author. Two additional classes were added to this release 
of ZiFDB to accommodate the two-finger CoDA units 
(designated as CoDA_FlF2 and CoDA_F2F3). Similar 
to the Zinc Finger and Zinc Finger Array class designa- 
tions, both CoDA_FlF2 and CoDA_F2F3 point to the 
Article class, which provides information about relevant 



publications. Information about submitters of the new 
arrays is integrated into ZiFDB's existing Author class. 

Novel three-fingev aw ays and CoDA units 

Since the release of the previous version of ZiFDB, OPEN 
has been used to generate numerous additional ZFAs. 
More than 500 OPEN ZFAs are now housed in ZiFDB. 
CoDA arrays are a new array type in this version of 
ZiFDB, and the database currently has 181 CoDA 
arrays. ZiFDB v2.0 also has information about 319 
F1F2 CoDA units and 334 F2F3 units. 

Updated interface 

Search page for CoDA units 

In addition to the ability to search for individual ZFs and 
three-finger ZFAs, ZiFDB v2.0 allows users to search for 
all available CoDA units that recognize a given target 
sequence. By providing the DNA sequence of the F1F2 
or F2F3 triplets to be targeted, all of the corresponding 
CoDA units matching the input sequence are returned. If 
the user provides the nucleotide triplets for a 9-bp target 
site, information is provided about both F1F2 and F2F3 
units (Figure 1). 



i 



ZINC FINGER 
CONSORTIUM 



ZiFDB version 2.0 



Introductipn 
Instructions 
Search ZiFDB 
Submit data to ZiFDB 



ZIFT software 

ZincFincFer Consortium 
AMosne 



Submit zinc finger array information 



Array information: 



Finger informatiQii: 




Journal information if the array is outriished loptionalfc 



Title Journal 1 | 


^lume| 


IPage ,11 


rear 





Submitter information; 



1 Submitter^ name 1 


' |prs name 


Ji 


jlnstitutian [ 


|emall 


1, 



[ Submit ] [Resel | 



1 . If the binding site is 5'-GAA-GCC-GTT-3', tfien tfie 'FV is GTT, 'FZ is GCC and "FS* s GAA. 

2. In Itie comment area, please provide Information about ttie array activity or any additional Information you would llice to disclose about ttie array. 

3. Your email address will not be disclosed to anyone. 
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Figure 2. The array submission page. 
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Page for submitting arrays 

The array submission page collects four types of informa- 
tion (Figure 2): 

(1) Array information. The user can input the name 
given to a particular ZFA and the array type. As 
indicated above, the array type specifies the method 
used to engineer the ZFA (i.e. modular assembly, 
OPEN or CoDA). It is also possible to designate 
an array as natural (i.e. found in nature). The 
option remains to include engineering methods that 
may be developed at a later date. The user can also 
provide other useful information such as data con- 
cerning array activity. 

(2) Finger information. In this section, the DNA triplet, 
amino acid sequence of the recognition helix and the 
amino acid sequence of the entire ZF are provided 
for each finger in a ZFA. 

(3) Journal information. If the array has been pubHshed, 
then the citation is provided. 

(4) Submitter information. In this section, the name of 
the submitter and/or principal investigator, the name 
of the submitter's institution and an email address 
are provided. This information is used by the 
database curator to contact the submitter if there 
are any questions concerning the data. No informa- 
tion about the submitter is publicly disclosed. 

Newly added features and attributes in the array query 
output page 

In the output page of the array query, a search link for 
CoDA units is provided. This makes it convenient for the 
user to check whether CoDA units are available for 
assembling ZFAs that target the 9-bp DNA sequence 
they provided. Another newly added attribute of the 
array query output is the array type. This currently 
includes natural, modular assembly, OPEN and CoDA. 
As the success rate and efficacy of ZFAs engineered by 
different platforms differ considerably, this information 
may be valuable for users when choosing previously en- 
gineered arrays to use in their experiments. 



CONCLUSION 

One important new feature in this version of ZiFDB is the 
added ability of users to input ZFAs and ZFs into the 
database. This will ensure that the ZiFDB captures new 
information generated by the scientific community, 
including unpubhshed data. In addition, by now housing 
information about two-finger CoDA units and CoDA 
arrays, ZiFDB is current with the most recent ZFA engin- 
eering practices. Finally, the expanded number of ZFs and 
ZFAs that are now stored in ZiFDB will provide a rich 



resource for users interested in either ZFA engineering or 
better understanding how ZFs recognize their target DNA. 
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